Tips dataset
| Variable | Explanation |
|---|---|
obs |
Observation number |
totbill |
Total bill (cost of the meal), including tax, in US dollars |
tip |
Tip (gratuity) in US dollars |
sex |
Sex of person paying for the meal (0=male, 1=female) |
smoker |
Smoker in party? (0=No, 1=Yes) |
day |
3=Thur, 4=Fri, 5=Sat, 6=Sun |
time |
0=Day, 1=Night |
size |
Size of the party |
## term estimate std.error statistic p.value
## 1 (Intercept) 0.20656 0.02492 8.2892 8.65e-15
## 2 sexM -0.00854 0.00835 -1.0234 3.07e-01
## 3 smokerYes 0.00364 0.00850 0.4280 6.69e-01
## 4 daySat -0.00177 0.01834 -0.0967 9.23e-01
## 5 daySun 0.01667 0.01902 0.8764 3.82e-01
## 6 dayThu -0.01818 0.02319 -0.7837 4.34e-01
## 7 timeNight -0.02337 0.02612 -0.8948 3.72e-01
## 8 size -0.00962 0.00422 -2.2824 2.34e-02
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## 1 0.042 0.0136 0.0607 1.48 0.175 8 342 -665 -634
## deviance df.residual
## 1 0.868 236
tipstipstipstipsArithmetic mean
\[\bar{x} = \frac{1}{n} \sum_{i = 1}^n x_i\]
Variance
\[E[X] = \mu\]
\[\text{Var}(X) \equiv \sigma^2 = E[X^2] - (E[X])^2\]
Standard deviation
\[\sigma = \sqrt{E[X^2] - (E[X])^2}\]
Median absolute deviation
\[MAD = \text{median}(|X_i - \text{median}(X)|)\]
Nonparametric density estimation
\[x_0 + 2(j - 1)h \leq X_i < x_0 + 2jh\]
\[\hat{p}(x) = \frac{\#_{i = 1}^n [x_0 + 2(j - 1)h \leq X_i < x_0 + 2jh]}{2nh}\]
\[\hat{p}(x) = \frac{\#_{i = 1}^n [x_0 + 2(j - 1)h \leq X_i < x_0 + 2jh]}{2nh}\]
\[\hat{p}(x) = \frac{1}{nh} \sum_{i = 1}^n W \left( \frac{x - X_i}{h} \right)\]
\[W(z) = \begin{cases} \frac{1}{2} & \text{for } |z| < 1 \\ 0 & \text{otherwise} \\ \end{cases}\]
\[z = \frac{x - X_i}{h}\]
Kernels
\[\hat{x}(x) = \frac{1}{nh} \sum_{i = 1}^k K \left( \frac{x - X_i}{h} \right)\]
\[K(z) = \frac{1}{\sqrt{2 \pi}}e^{-\frac{1}{2} z^2}\]
\[K(z) = \frac{1}{2} \mathbf{1}_{\{ |z| \leq 1 \} }\]
\[K(z) = (1 - |z|) \mathbf{1}_{\{ |z| \leq 1 \} }\]
\[K(z) = \frac{15}{16} (1 - z^2)^2 \mathbf{1}_{\{ |z| \leq 1 \} }\]
\[K(z) = \frac{3}{4} (1 - z^2) \mathbf{1}_{\{ |z| \leq 1 \} }\]
\[h = 0.9 \sigma n^{-1 / 5}\]
\[A = \min \left( S, \frac{IQR}{1.349} \right)\]
## # A tibble: 3 x 24
## title year length budget
## <chr> <int> <int> <int>
## 1 Cure for Insomnia, The 1987 5220 NA
## 2 Four Stars 1967 1100 NA
## 3 Longest Most Meaningless Movie in the World, The 1970 2880 NA
## # ... with 20 more variables: rating <dbl>, votes <int>, r1 <dbl>,
## # r2 <dbl>, r3 <dbl>, r4 <dbl>, r5 <dbl>, r6 <dbl>, r7 <dbl>, r8 <dbl>,
## # r9 <dbl>, r10 <dbl>, mpaa <chr>, Action <int>, Animation <int>,
## # Comedy <int>, Drama <int>, Documentary <int>, Romance <int>,
## # Short <int>
| A | B | C | D |
|---|---|---|---|
| 10 | 20 | 40 | 30 |
movies example